Sub queries vs CTE:


Welcome to Insightninja, This is part 7 of 'SQL for data analytics' series.
Click here to catch up on the previous 6 parts. Here we are going to settle up the debate of CTE vs Sub queries in SQL. What is best suited for a certain situation? Let's figure out. 

We are going to use STATS table for all our examples in this blog.


Here's table creation script.

CREATE TABLE STATS    (  

    Match_No        INT                 PRIMARY KEY,

    Opposition      VARCHAR(50)         Not null,

    RunsScored      INT                 DEFAULT 0,

    BallFaced       INT,

    StrikeRate      FLOAT,

    MatchDate       DATE,

    Venue           VARCHAR(100))

 

INSERT INTO STATS (Match_No,Opposition,RunsScored,BallFaced,StrikeRate,MatchDate,Venue)

    VALUES

    (1,'South Africa',18,34,52.94 ,'2019-06-05','Southampton'),

    (2,'Australia'   ,82,77,106.49,'2019-06-09','The Oval'),

    (3,'Pakistan'    ,77,65,118.46,'2019-06-16','Manchester'),

    (4,'Afghanistan' ,67,63,106.34,'2019-06-22','Southampton'),

    (5,'West Indies' ,72,82,87.80 ,'2019-06-27','Manchester'),

    (6,'England'     ,66,76,86.84 ,'2019-06-30','Birmingham'),

    (7,'Bangladesh'  ,26,27,96.29 ,'2019-07-02','Birmingham'),

    (8,'Sri Lanka'   ,34,41,82.93 ,'2019-07-06','Leeds'),

    (9,'New Zealand' ,01,06,16.16 ,'2019-07-09','Manchester')


Sub Queries in SQL:

Subqueries in SQL are queries that are nested within another query. They are used to retrieve data that will be used in the context of the outer query. Subqueries can be categorized into several types based on where they are used and how they contribute to the overall query. Here are the main types of subqueries:

 

1. Single-Row Subquery:

   A single-row subquery returns a single value or row, which is then used for comparison or evaluation in the outer query. 

Example:

SELECT opposition

FROM STATS

WHERE Match_No = (SELECT Match_No FROM STATS WHERE MatchDate ='2019-07-09');

O/P:


Here, Inner query will return single ‘match_no’  value, which then will be used in where clause of outer query.

2. Multi-Row Subquery:

   A multi-row subquery returns multiple rows, which are used in comparison or evaluation with the outer query.

Example:  

SELECT opposition

FROM STATS

WHERE Match_No IN (SELECT Match_No FROM STATS WHERE Venue ='Manchester' );

O/P:


Here, Inner query returns 3 values which are used as arguments for IN operator of outer query.

 

3. Correlated Subquery:

   A correlated subquery references columns from the outer query within the subquery. The subquery is executed for each row processed by the outer query.

 Example:

   SELECT Opposition,

   (SELECT AVG(RunsScored) FROM STATS WHERE Opposition != d.Opposition) AS Avg_runs

   FROM STATS d;

O/P:


Here, Inner query is part of outer query as we are using reference of table from outer query. This is one of the example that resembles to self join. In result we get opposition names and average runs, where score against  that particular Opposition’s is excluded

4. Nested Subquery:

   A nested subquery is a subquery that is placed within another subquery. This allows for more complex conditions and comparisons.

Example:

SELECT Opposition

FROM STATS

WHERE RunsScored > (SELECT avg(RunsScored) FROM STATS WHERE MatchDate in 

(SELECT MatchDate FROM STATS WHERE venue != 'Manchester'));

O/P:


Here we have 2 inner queries and an outer main query. second inner query returns dates of matches not played in ‘Manchester’, using this list of dates we find average runs scored in second inner query and finally in outer query we are finding Opposition with more runs than average.

 

5. Subquery as a result set:

  This type of subqueries are used as a temporary result block for comparison. 

   SELECT A.Opposition,A.Venue,B.MatchDate from

   (SELECT Opposition,Venue FROM STATS) AS A

   inner join

   (SELECT Opposition,MatchDate FROM STATS where BallFaced>50) AS B

   on A.Opposition=B.Opposition

O/P:


Here, We have created to independent result block using physical table and named them A and B respectively. Then we have fetched few columns from resultant blocks.

Subqueries are a powerful tool in SQL that allow you to perform various types of comparisons, filtering, and data retrieval. However, they should be used judiciously, as overly complex or deeply nested subqueries can lead to performance issues and decreased code readability. In some cases, using other techniques like joins or Common Table Expressions (CTEs) might be more efficient and maintainable. 

 

Common Table Expression (CTE):

CTE is a Temporary named result set that can be referenced within a query. CTEs are primarily used in SQL queries to simplify complex queries, improve readability, and avoid code duplication. 

There are two main types of CTEs:

 let's work through both a non-recursive and a recursive Common Table Expression (CTE) example using  cricket match statistics data. 

Non-Recursive Common Table Expression (nrCTE):

In this example, we'll create a non-recursive CTE to calculate the total runs scored across all matches.

Example:

WITH TotalRunsCTE AS (

    SELECT

        SUM(RunsScored) AS TotalRuns

    FROM STATS

)

SELECT * FROM TotalRunsCTE;

O/P:


In this non-recursive CTE, we're calculating the sum of runs scored across all matches in the provided data.

We can create multiple result sets in single CTE, Let’s see how.

WITH TotalRunsCTE AS (

    SELECT

        Opposition,RunsScored AS TotalRuns

    FROM STATS WHERE Venue<>'Leeds'

)

,FinalCTE AS (

    SELECT

        Opposition,TotalRuns AS TotalRuns

    FROM TotalRunsCTE WHERE TotalRuns>=(SELECT AVG(TotalRuns) FROM TotalRunsCTE)

)

SELECT * FROM FinalCTE;

O/P:


Here, In first block named TotalRunsCTE we are getting oppostition and runs column from all the venues except leeds, We are using first block for querying in second block called FinalCTE where we want opposition and total runs where total runs are greater than average runs from TotalRunsCTE.

Then finally selecting everything from FinalCTE. We can explicitly mention column names here.

 

Recursive Common Table Expression (rCTE):

A recursive CTE is used when you need to create a query that refers to its own results.

It's often used to work with hierarchical or recursive data structures like organizational charts or hierarchies in databases.

Here's how a recursive CTE works:

Anchor Member:

The anchor member is the initial SELECT statement that defines the base case for the recursion. It's the starting point of the recursion.

Recursive Member:

The recursive member follows the anchor member and references the CTE itself in the FROM clause. This is where the recursion happens. It combines the results of the previous iteration with new data to build the next level of the hierarchy.

Termination Condition:

The recursion continues until a termination condition is met. This condition is typically specified in the WHERE clause of the recursive member and defines when the recursion should stop.

Result Set:

The final result set is composed of all the iterations performed during the recursion, including the anchor member and all the recursive members.

In this example, we'll create a recursive CTE to calculate the running total of cumulative runs for each match.

WITH RecursiveCTE AS (

    SELECT Match_No,Opposition,RunsScored,BallFaced,StrikeRate,MatchDate,Venue,

           RunsScored AS CumulativeRuns

    FROM #STATS

    WHERE Match_No = 1 -- Anchor member

     UNION ALL 

    SELECT s.Match_No,s.Opposition,s.RunsScored,s.BallFaced,s.StrikeRate,s.MatchDate,s.Venue,

           r.CumulativeRuns + s.RunsScored AS CumulativeRuns

    FROM #STATS s

    INNER JOIN RecursiveCTE r ON s.Match_No = r.Match_No + 1 -- Recursive member

)

SELECT * FROM RecursiveCTE;

O/P:


In this recursive CTE, we're calculating the running total of cumulative runs scored in each match. The anchor member selects the statistics for the first match, and the recursive member adds the runs scored for the current match to the cumulative runs scored in the previous match. 

Please note that in this specific case, using a recursive CTE might not be the most appropriate approach, as cricket match statistics data isn't hierarchical or recursive in nature. Recursive CTEs are usually used for hierarchical data like organizational structures. However, this example serves to demonstrate the concept of recursive CTEs using your data. 

 

Head to Head Comparison:

Aspect

Common Table Expressions (CTEs)

Subqueries

Purpose

Temporary named result set that can be referenced within a query.

Nested queries used to retrieve data for comparison, filtering, etc.

Reusability

Can be referenced multiple times within a query.

Typically used once within a single query.

Readability

Improves query structure and readability, especially for complex queries.

May lead to more complex and nested query structure.

Recursive Operations

Ideal for hierarchical or recursive operations using recursive CTEs.

Generally not used for recursive operations.

Performance

May be optimized by the database engine; performance can vary.

Performance can degrade with deeply nested subqueries.

Clarity and Maintenance

Enhances code maintainability by separating logical steps.

May become harder to maintain in complex queries.

Use Cases

Well-suited for complex queries, recursive operations, and multiple references.

Suitable for simple comparisons and filtering.

Syntax

Uses the WITH keyword to define the CTE before the main query.

Nested SELECT statements within the main query.

 

Remember that the choice between CTEs and subqueries depends on the specific requirements of your query and your preference for code readability and maintainability. In some cases, a combination of both techniques might be the best approach to achieving your desired results while keeping the code understandable.

Hope you have gained confidence over Subqueries and CTEs. Please provide your feedback in comment section.

Happy Learning.